Project Description

Finding a nice neighborhood to live in.

Most adults during their life, have atleast made a single change of living location due to personal or professional reasons. However finding a decent place where all the personal needs are met is highly stressful. Some people enjoy doing the detailed analysis by hand thereby figuring out following information like,

  1. the average rent in a neighborhood,
  2. transport amenities (both public and private),
  3. best schools for kids(if they have any, as this is a very important choice),
  4. average commute time from that neighborhood to work place,
  5. nearest hospitals,
  6. crime rates,
  7. police or communal violence,
  8. political or racial tensions,
  9. public health crisis in the last few years,
  10. tax rates
  11. public places like parks,
  12. places to have leisure sctivities like going to restaurants, movie theaters, saunas, gym, etc.,

These are few common things to consider during a relocation and they are very important for a healthy and stress free life. And doing an Online search for these is time consuming. In this project, we aim to meet atleast some common requirements like public places, parks, restaurants etc. and guide the project users to choose atleast 1 or more neighborhoods for their consideration during their relocation. By rating the neighborhoods based upon the available amenities they have, we can recommend the neighborhood in a city in a ranked order. This is the goal of this project.

Target Audience:-
Someone who wants to relocate to a city based on available public services.

Stakeholders:-

  1. Someone who wants to relocate to a city.
  2. Myself.

Data Description

We use public libraries and API's in this project. We use Wikipedia and FourSquare API, Some common Python Libraries for programming.

Wikipedia:-

From wikipedia pages, we can identify the neighborhood around the city. Every major cities have these information in their wiki page. We access the web page and then extract the neighborhood information.

Date Type:- XML and HTML

Duration:-
< 10 seconds

Description of the data:-
Location coordinates obtained by Geocoder calls.

Source:- (https://en.wikipedia.org/wiki/Main_Page)

Foursquare API:

Foursquare provides a valuable and publically accessible location information like the ameneties in nearby locations. We use their developer tools to access the required information about the neighboords in a city. Using these accessed information we then rank the neighborhoods based on the ameneties they have. These services are free of charge.

We create a Foursquare developer account, and after that we provide some zip codes inside a city and for each zip code or LatLon info(Latitude and Longitude Points) we provided we extract details on the ameneties we expect a neighborhood should have. So we set the radius of this search around to zip code to be around 1km.

Date Type:- JSON

Duration:-
N/A

Description of the data:-
Location coordinates obtained by Foursquare API calls.

Source:- (https://foursquare.com/)

Public Programming Tools:-

We use some public plotting tools like Folium to visualize the neighborhoods in the city we want to relocate. Then based upon the analysis of the above combined information we can update the Folium visualization to reflect the number of amenities in a neighborhood.

K-Means Clustering Algorithm on the Data:-

We can use K-Means Clustering algorithm to group amenities in an area, then we can reduce the number of individual amenities comparisons to be done against each neighborhood. We can do these comparisons against the types of amenities, individually, collectively, or alltogether.

Data Preprocessing

In order to learn about the neighborhood we use Wikipedia to Identify the list of neighborhoods in the city of Toronto. This is implemented in the following code.


In [20]:
import numpy as np
import pandas as pd
import requests
import lxml
import folium
import matplotlib.pyplot as plt
import seaborn as sns
import geocoder
import warnings

from tqdm import tqdm
from IPython.display import Image 
from IPython.core.display import HTML
from sklearn.cluster import KMeans
from pandas.io.json import json_normalize
from geopy.geocoders import Nominatim
from bs4 import BeautifulSoup

warnings.filterwarnings('ignore')

from tqdm import tqdm
import sys
import linecache

def PrintException():
    exc_type, exc_obj, tb = sys.exc_info()
    f = tb.tb_frame
    lineno = tb.tb_lineno
    filename = f.f_code.co_filename
    linecache.checkcache(filename)
    line = linecache.getline(filename, lineno, f.f_globals)
    print('EXCEPTION IN ({}, LINE {} "{}"): {}'.format(filename, lineno, line.strip(), exc_obj))

%matplotlib inline

In [2]:
# Finding the postals codes, neighborhoods in Toronto, Canada
wiki_page = requests.get('https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M').text
soup = BeautifulSoup(wiki_page, 'lxml')
table = soup.find('table')
# table

toronto_table = soup.find('table',{'class':'wikitable sortable'})
links = toronto_table.findAll('td')

pincodes = []
count = 0
for x in links:
    if count == 0:
        x1 = x.text
        count += 1
    elif count == 1:
        x2 = x.text
        count +=1
    elif count == 2:
        x3 = x.text
        x3 = x3.replace('\n','')
        count = 0
        if x3 == 'Not assigned':
            x3 = x2
        if x2 != 'Not assigned':            
            pincodes.append((x1,x2,x3))
# print (pincodes)

result = {}
for x in pincodes:
    if x[0] in result:
        result[x[0]] = [x[0], x[1], result[x[0]][1] + ', ' + x[2]]
    else:
        result[x[0]] = [x[0], x[1], x[2]]
            
results = {}
for count, x in enumerate(result):
    results[count] = [x, result[x][1], result[x][2]]
    
# print(results)

toronto_data = pd.DataFrame.from_dict(results, orient='index', columns=['PostalCode', 'Borough', 'Neighborhood'])
toronto_data['latitude'] = None
toronto_data['longitude'] = None
# toronto_data['response'] = None
toronto_data


Out[2]:
PostalCode Borough Neighborhood latitude longitude
0 M3A North York Parkwoods None None
1 M4A North York Victoria Village None None
2 M5A Downtown Toronto Downtown Toronto, Regent Park None None
3 M6A North York North York, Lawrence Manor None None
4 M7A Queen's Park Queen's Park None None
5 M9A Etobicoke Islington Avenue None None
6 M1B Scarborough Scarborough, Malvern None None
7 M3B North York Don Mills North None None
8 M4B East York East York, Parkview Hill None None
9 M5B Downtown Toronto Downtown Toronto, Garden District None None
10 M6B North York Glencairn None None
11 M9B Etobicoke Etobicoke, West Deane Park None None
12 M1C Scarborough Scarborough, Port Union None None
13 M3C North York North York, Don Mills South None None
14 M4C East York Woodbine Heights None None
15 M5C Downtown Toronto St. James Town None None
16 M6C York Humewood-Cedarvale None None
17 M9C Etobicoke Etobicoke, Old Burnhamthorpe None None
18 M1E Scarborough Scarborough, West Hill None None
19 M4E East Toronto The Beaches None None
20 M5E Downtown Toronto Berczy Park None None
21 M6E York Caledonia-Fairbanks None None
22 M1G Scarborough Woburn None None
23 M4G East York Leaside None None
24 M5G Downtown Toronto Central Bay Street None None
25 M6G Downtown Toronto Christie None None
26 M1H Scarborough Cedarbrae None None
27 M2H North York Hillcrest Village None None
28 M3H North York North York, Wilson Heights None None
29 M4H East York Thorncliffe Park None None
... ... ... ... ... ...
73 M4R Central Toronto North Toronto West None None
74 M5R Central Toronto Central Toronto, Yorkville None None
75 M6R West Toronto West Toronto, Roncesvalles None None
76 M7R Mississauga Canada Post Gateway Processing Centre None None
77 M9R Etobicoke Etobicoke, St. Phillips None None
78 M1S Scarborough Agincourt None None
79 M4S Central Toronto Davisville None None
80 M5S Downtown Toronto Downtown Toronto, University of Toronto None None
81 M6S West Toronto West Toronto, Swansea None None
82 M1T Scarborough Scarborough, Tam O'Shanter None None
83 M4T Central Toronto Central Toronto, Summerhill East None None
84 M5T Downtown Toronto Downtown Toronto, Kensington Market None None
85 M1V Scarborough Scarborough, Steeles East None None
86 M4V Central Toronto Central Toronto, Summerhill West None None
87 M5V Downtown Toronto Downtown Toronto, South Niagara None None
88 M8V Etobicoke Etobicoke, New Toronto None None
89 M9V Etobicoke Etobicoke, Thistletown None None
90 M1W Scarborough Scarborough, Steeles West None None
91 M4W Downtown Toronto Rosedale None None
92 M5W Downtown Toronto Stn A PO Boxes 25 The Esplanade None None
93 M8W Etobicoke Etobicoke, Long Branch None None
94 M9W Etobicoke Northwest None None
95 M1X Scarborough Upper Rouge None None
96 M4X Downtown Toronto Downtown Toronto, St. James Town None None
97 M5X Downtown Toronto Downtown Toronto, Underground city None None
98 M8X Etobicoke Etobicoke, Old Mill North None None
99 M4Y Downtown Toronto Church and Wellesley None None
100 M7Y East Toronto Business Reply Mail Processing Centre 969 Eastern None None
101 M8Y Etobicoke Etobicoke, Sunnylea None None
102 M8Z Etobicoke Etobicoke, South of Bloor None None

103 rows × 5 columns


In [24]:
toronto_data.head()


Out[24]:
PostalCode Borough Neighborhood latitude longitude
0 M3A North York Parkwoods 43.7588 -79.3202
1 M4A North York Victoria Village 43.7327 -79.3112
2 M5A Downtown Toronto Downtown Toronto, Regent Park None None
3 M6A North York North York, Lawrence Manor None None
4 M7A Queen's Park Queen's Park 49.2151 -122.906

Data Analysis

Finding Latitude and Longitude

  1. After learning the neighborhood and its details, we prepare the address of each neighborhood and request Geolocator for their latitude and longitude in the following code.
  2. To verify the accuracy fo the latitude and the longitude values we call do reverse geolocation using the geocoder.

In [4]:
# Fidning the latitude and longitude of the postal codes in toronto
import geocoder
from tqdm import tqdm

# Parkwoods (37.8567738, -122.220687780045)
locations = dict()
for index, data in toronto_data.iterrows():
    address = data["Neighborhood"]
    address =  address + ", " + data["Borough"]  + ", "+ " Canada"# + "Toronto"  + ", " 
    # print(address)
    geolocator = Nominatim(user_agent="coursera_capstone_project", timeout=1000)
    location = geolocator.geocode(address)
    if location:
        locations[address] = (location.latitude, location.longitude)
        toronto_data.loc[index,'latitude'] = location.latitude
        toronto_data.loc[index,'longitude'] = location.longitude
        print(address, (location.latitude, location.longitude))
print("Completed locations: ", len(locations))


Parkwoods, North York,  Canada (43.7587999, -79.3201966)
Victoria Village, North York,  Canada (43.732658, -79.3111892)
Queen's Park, Queen's Park,  Canada (49.21511495, -122.905923850434)
Islington Avenue, Etobicoke,  Canada (43.7039092, -79.54988)
Scarborough, Malvern, Scarborough,  Canada (43.7761255, -79.2584355948573)
Don Mills North, North York,  Canada (43.737178, -79.3434514)
Downtown Toronto, Garden District, Downtown Toronto,  Canada (43.6528208, -79.3767112)
Glencairn, North York,  Canada (43.7087117, -79.4406853)
Etobicoke, West Deane Park, Etobicoke,  Canada (43.6668127, -79.5721652)
Scarborough, Port Union, Scarborough,  Canada (43.776552, -79.140025)
Woodbine Heights, East York,  Canada (43.6999302, -79.3191316)
St. James Town, Downtown Toronto,  Canada (43.6625723, -79.3764109)
Etobicoke, Old Burnhamthorpe, Etobicoke,  Canada (43.643785, -79.59269)
Scarborough, West Hill, Scarborough,  Canada (43.773077, -79.257774)
The Beaches, East Toronto,  Canada (43.6710244, -79.296712)
Woburn, Scarborough,  Canada (43.7598243, -79.2252908)
Leaside, East York,  Canada (43.7047983, -79.3680904)
Central Bay Street, Downtown Toronto,  Canada (43.6603776, -79.38564)
Christie, Downtown Toronto,  Canada (43.6499915, -79.3857217)
Cedarbrae, Scarborough,  Canada (43.75646655, -79.226692442588)
Hillcrest Village, North York,  Canada (43.7996637, -79.3650189)
Thorncliffe Park, East York,  Canada (43.704553, -79.3454074)
West Toronto, Dufferin, West Toronto,  Canada (43.6549842, -79.4335749)
Scarborough Village, Scarborough,  Canada (43.7437422, -79.2116324)
East Toronto, East York,  Canada (43.6247901, -79.3934918)
Scarborough, Kennedy Park, Scarborough,  Canada (43.7169869, -79.2546806)
Bayview Village, North York,  Canada (43.7691966, -79.3766617)
East Toronto, Riverdale, East Toronto,  Canada (43.67692, -79.3390349)
Scarborough, Oakridge, Scarborough,  Canada (43.6912635, -79.2873426)
Downsview West, North York,  Canada (43.7492988, -79.462248)
Downtown Toronto, Victoria Hotel, Downtown Toronto,  Canada (43.6602799, -79.3759504)
Humber Summit, North York,  Canada (43.7600778, -79.5717598)
Scarborough, Scarborough Village West, Scarborough,  Canada (43.7425165, -79.2078754)
North York, Willowdale, North York,  Canada (43.7708175, -79.4132998)
Downsview Central, North York,  Canada (43.7492988, -79.462248)
York, Silverthorn, York,  Canada (43.6817465, -79.4739616)
Scarborough, Cliffside West, Scarborough,  Canada (43.69846545, -79.2515200320226)
Willowdale South, North York,  Canada (43.7753558, -79.4166859823926)
Downsview Northwest, North York,  Canada (43.7492988, -79.462248)
Lawrence Park, Central Toronto,  Canada (43.729199, -79.4032525)
Roselawn, Central Toronto,  Canada (43.7054521, -79.4251817)
York, Runnymede, York,  Canada (43.666155, -79.4877088)
Weston, York,  Canada (43.7001608, -79.5162474)
Scarborough, Wexford Heights, Scarborough,  Canada (43.7485778, -79.3129735)
York Mills West, North York,  Canada (43.7440391, -79.406657)
Davisville North, Central Toronto,  Canada (43.7043123, -79.3885169)
Westmount, Etobicoke,  Canada (43.6936399, -79.5210426)
Scarborough, Wexford, Scarborough,  Canada (43.7598306, -79.2964702)
Willowdale West, North York,  Canada (43.7753558, -79.4166859823926)
Central Toronto, Yorkville, Central Toronto,  Canada (43.6730826, -79.3882891526121)
West Toronto, Roncesvalles, West Toronto,  Canada (43.6393188, -79.446221)
Etobicoke, St. Phillips, Etobicoke,  Canada (43.7014448, -79.5486079)
Agincourt, Scarborough,  Canada (43.7853531, -79.2785494)
Davisville, Central Toronto,  Canada (43.7043123, -79.3885169)
West Toronto, Swansea, West Toronto,  Canada (43.638093, -79.4665843)
Scarborough, Tam O'Shanter, Scarborough,  Canada (43.7791569, -79.304518)
Scarborough, Steeles East, Scarborough,  Canada (43.8019074, -79.3091903204758)
Etobicoke, New Toronto, Etobicoke,  Canada (43.6435559, -79.5656326)
Etobicoke, Thistletown, Etobicoke,  Canada (43.7360174, -79.5624402)
Scarborough, Steeles West, Scarborough,  Canada (43.8019074, -79.3091903204758)
Rosedale, Downtown Toronto,  Canada (43.6563221, -79.3809161)
Etobicoke, Long Branch, Etobicoke,  Canada (43.6435559, -79.5656326)
Northwest, Etobicoke,  Canada (43.6435559, -79.5656326)
Upper Rouge, Scarborough,  Canada (43.8049304, -79.1658374)
Downtown Toronto, St. James Town, Downtown Toronto,  Canada (43.6625723, -79.3764109)
Etobicoke, Old Mill North, Etobicoke,  Canada (43.6435559, -79.5656326)
Church and Wellesley, Downtown Toronto,  Canada (43.6611949, -79.3821143)
Completed locations:  67

In [5]:
print("Completed finding Latitude and Longitude for " + str(len(locations)) + " locations.")


Completed finding Latitude and Longitude for 67 locations.

Using FourSquare for Finding the public places nearby the neighborhoods

  1. For the 64 locations that we have found the LatLon values, we use foursquare to find the public places like coffee shops, bars, other shops, pubs, schools train startion, bus stations, parks , etc.

  2. We use a radius of 1 KM for each neighborhood within which the foursquare api will return the above searched public spaces and their information. We search for atleast 200 such spaces in a neighborhood.

  3. The def get_category_type function analyses the values returned by the foursquare API and filters the venue categories and find out if they matches the public space types that we are looking for.


In [40]:
CLIENT_ID = 'BVUJUDWGVYK0JVR5VRXGN5E2PN5ZWM2J535XGCBHSDBQTNPE'
CLIENT_SECRET = '3FUU2PXF3VP1HH5TIIWHYYOKMAOHLN1VZ4ZR13B4DBOKM5L2'
VERSION = '20180605'

import warnings
warnings.filterwarnings('ignore')
    
search_queries = ['bus', 'coffee', 'bar', 'shop', 'pub', 'school', 'train', 'park', 'hospital', 'police']
radius = 1000
LIMIT = 250

def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

# Fidnign al  the public places available near the geo locations
data = pd.DataFrame()
res_temp = {}
for neighborhood, location in locations.items():
    
    # print(neighborhood, neighborhood.split(",")[-2])
    # te = input("Test")
    temp = pd.DataFrame()
    results = None
    try:
        for query in search_queries:
            # print(query)
            url = 'https://api.foursquare.com/v2/venues/search?client_id={}&client_secret={}&ll={},{}&v={}&query={}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, location[0], location[1], VERSION, query, radius, LIMIT)
            results = requests.get(url).json()
            if results:
                venues = results['response']['venues']
                dataframe = json_normalize(venues)
                print(dataframe.shape)
                # dataframe['location.response'] = str(results['response'])
                filtered_columns = ['name', 'categories', 'id'] + [col for col in dataframe.columns if col.startswith('location.')]
                dataframe_filtered = dataframe.loc[:, filtered_columns]
                dataframe_filtered['categories'] = dataframe_filtered.apply(get_category_type, axis=1)
                dataframe_filtered.columns = [column.split('.')[-1] for column in dataframe_filtered.columns]
                dataframe_filtered['query'] = query
                temp = pd.concat([temp, dataframe_filtered])
        temp['neighborhood'] = neighborhood.split(",")[-2]
        data = pd.concat([data, temp])
        # print(len(data))
        break
    except:
        # PrintException()
        continue
    break
print(data.shape)
data.head()


(12, 17)
(0, 0)
(2, 16)
(0, 0)
(0, 0)
Out[40]:

In [41]:
results


Out[41]:
{'meta': {'code': 429,
  'errorType': 'quota_exceeded',
  'errorDetail': 'Quota exceeded',
  'requestId': '5c5df4a31ed2196ad88e917c'},
 'response': {}}

Note:-

For the 67 places we provided to Foursquare API, we received information on 4554 public places.


In [47]:
# Number of Neighborhoods In Toronto
print(list(set(data["neighborhood"])))
print("Number of Neighborhoods In Toronto", len(data["neighborhood"].unique()))


[' West Toronto', ' Downtown Toronto', ' York', ' North York', ' Central Toronto', ' Scarborough']
Number of Neighborhoods In Toronto 6

Visualizing the Number of Venues Obtained form the FourSquare API


In [41]:
data['neighborhood'] = data['neighborhood'].apply(lambda x: x.split(',')[0])
data.rename(columns={'query': 'Venue Type'}, inplace = True)

sns.set(style='darkgrid')
plt.figure(figsize=(10,15))
neighborhoods = data['neighborhood'].apply(lambda x: x.split(',')[0])
sns.countplot(neighborhoods, data = data, hue = 'Venue Type', order = neighborhoods.value_counts().index)
plt.xticks(rotation=75)
plt.title('Number of venues types in each neighborhood', pad=10, fontsize = 15)
plt.ylabel('Number of venues')
plt.xlabel('Neighborhoods in Toronto - Ordered in Desceneding based on available public places')
plt.savefig("no_of_venues.jpg")
plt.tight_layout()


Observation:- From the above plot, we can visually verify that Downtown Toronto is the top neighborhood with highest number of public spaces.

Scatter Plot of all the venues in Toronto


In [66]:
lat_data = data['lat']
lng_data = data['lng']
neighborhood = data['neighborhood'].apply(lambda x: x.split(',')[0])

plt.scatter(lng_data, lat_data)
plt.xlabel('Longitude')
plt.ylabel('Latitude')
plt.title('Scatter Plot of Public Places in Toronto')
plt.savefig('Public_Places_Scatter_Plot.png')
plt.show()



In [ ]:

Details about the top venue


In [42]:
# Choosing the top two neighborhood by using visual analysis of the above graph
dtoronto = data[data['neighborhood'].apply(lambda x: 'Downtown Toronto' in x)]
dtoronto.head()


Out[42]:
address categories cc city country crossStreet distance formattedAddress id labeledLatLngs lat lng name neighborhood postalCode Venue Type state
0 141 Bay St Bus Station CA Toronto Canada at Front St 799 [141 Bay St (at Front St), Toronto ON M5J 1J5,... 4ba17563f964a520ceb837e3 [{'label': 'display', 'lat': 43.64565538508229... 43.645655 -79.377447 Union Station GO Bus Terminal Downtown Toronto M5J 1J5 bus ON
1 NaN Bus Line CA Toronto Canada NaN 719 [Toronto ON, Canada] 4fad4358e4b00911a7296917 [{'label': 'display', 'lat': 43.65597396661909... 43.655974 -79.384504 Bus To Niagara Falls Downtown Toronto NaN bus ON
2 NaN Bus Line CA Toronto Canada NaN 784 [Toronto ON, Canada] 4bda14603904a5934203459e [{'label': 'display', 'lat': 43.64577099686206... 43.645771 -79.376883 Bramalea Go Bus Downtown Toronto NaN bus ON
3 NaN Bus Line CA NaN Canada NaN 786 [Canada] 4b81e070f964a5204bc230e3 [{'label': 'display', 'lat': 43.64575296961094... 43.645753 -79.376729 Lincolnville GO Bus - Northbound Downtown Toronto NaN bus NaN
4 NaN Bus Line CA Toronto Canada NaN 835 [Toronto ON, Canada] 4bf5b4e894b2a593e424acee [{'label': 'display', 'lat': 43.64811312219104... 43.648113 -79.384790 TTC Bus #143 - Beach Express Downtown Toronto NaN bus ON

Visulaization of the Neighborhoods in Toronto

Here we use Folium Map visualization tool to see the neighborhoods in Totonto


In [90]:
general_location = geolocator.geocode('Toronto')
geolocator = Nominatim(user_agent="coursera_capstone_project", timeout=1000)
venues_map = folium.Map(location=[43.653908, -79.384293], zoom_start=11)

for neighborhood, location in locations.items():
    folium.CircleMarker(
        [location[0], location[1]],
        radius=5,
        color='red',
        popup=neighborhood,
        fill = True,
        fill_color = 'red',
        fill_opacity = 0.6
    ).add_to(venues_map)
    
venues_map


Out[90]:

Visualizing public locations only in Downtown Toronto Neighborhood


In [8]:
downtown_data = data[data['neighborhood'] == ' Downtown Toronto']
downtown_data.head()


Out[8]:
address categories cc city country crossStreet distance formattedAddress id labeledLatLngs lat lng name neighborhood postalCode Venue Type state
0 141 Bay St Bus Station CA Toronto Canada at Front St 799 ['141 Bay St (at Front St)', 'Toronto ON M5J 1... 4ba17563f964a520ceb837e3 [{'label': 'display', 'lat': 43.64565538508229... 43.645655 -79.377447 Union Station GO Bus Terminal Downtown Toronto M5J 1J5 bus ON
1 NaN Bus Line CA Toronto Canada NaN 719 ['Toronto ON', 'Canada'] 4fad4358e4b00911a7296917 [{'label': 'display', 'lat': 43.65597396661909... 43.655974 -79.384504 Bus To Niagara Falls Downtown Toronto NaN bus ON
2 NaN Bus Line CA Toronto Canada NaN 784 ['Toronto ON', 'Canada'] 4bda14603904a5934203459e [{'label': 'display', 'lat': 43.64577099686206... 43.645771 -79.376883 Bramalea Go Bus Downtown Toronto NaN bus ON
3 NaN Bus Line CA NaN Canada NaN 786 ['Canada'] 4b81e070f964a5204bc230e3 [{'label': 'display', 'lat': 43.64575296961094... 43.645753 -79.376729 Lincolnville GO Bus - Northbound Downtown Toronto NaN bus NaN
4 NaN Bus Line CA Toronto Canada NaN 835 ['Toronto ON', 'Canada'] 4bf5b4e894b2a593e424acee [{'label': 'display', 'lat': 43.64811312219104... 43.648113 -79.384790 TTC Bus #143 - Beach Express Downtown Toronto NaN bus ON

In [9]:
data_to_plot_on_map = []
for entry in downtown_data[['neighborhood', 'lat', 'lng', 'name']].iterrows():
    data_to_plot_on_map.append([entry[1][0], entry[1][1], entry[1][2], entry[1][3]])
print("Done, ", len(data_to_plot_on_map))


Done,  3408

In [13]:
geolocator = Nominatim(user_agent="coursera_capstone_project", timeout=1000)
general_location = geolocator.geocode('Toronto')
venues_map = folium.Map(location=[43.653908, -79.384293], zoom_start=15)

for entry in data_to_plot_on_map[:1000]:
    neighborhood, lat, lng, categories = entry
    folium.CircleMarker(
        [lat, lng],
        radius=2,
        color='red',
        popup=categories,
        fill = True,
        fill_color = 'red',
        fill_opacity = 0.6
    ).add_to(venues_map)
    
venues_map


Out[13]:

In [42]:
# data = pd.read_csv("foursquare_data.csv", index_col=0)

In [86]:
# To avoid repetitive ffoursquare calls, we save the results of the merged dataframe to a csv file
data.to_csv("foursquare_data.csv")

In [43]:
pd.read_csv("foursquare_data.csv", index_col=0).shape


Out[43]:
(4554, 17)

Clustering


In [68]:
set(neighborhood)


Out[68]:
{' Central Toronto',
 ' Downtown Toronto',
 ' North York',
 ' Scarborough',
 ' West Toronto',
 ' York'}

In [73]:
# manual one hot encoding
y_true = []
encoded_neighborhood = {' Central Toronto':0,
                       ' Downtown Toronto':1,
                       ' North York':2,
                       ' Scarborough':3,
                       ' West Toronto':4,
                       ' York':5}
for n in neighborhood:
    y_true.append(encoded_neighborhood[n])
y_true[-4]


Out[73]:
1

In [87]:
X = []
for entry in zip(lng_data, lat_data):
    X.append(list(entry))
X = np.array(X)
X


Out[87]:
array([[-79.25774002,  43.77440618],
       [-79.25672181,  43.77469746],
       [-79.25910865,  43.77488906],
       ...,
       [-79.3846944 ,  43.6612227 ],
       [-79.384738  ,  43.657834  ],
       [-79.37235228,  43.6534796 ]])

In [ ]:


In [88]:
import matplotlib.pyplot as plt
import seaborn as sns; sns.set()  # for plot styling
import numpy as np
from sklearn.cluster import KMeans


kmeans = KMeans(n_clusters=6)
kmeans.fit(X)
y_kmeans = kmeans.predict(X)

plt.scatter(X[:, 0], X[:, 1], c=y_kmeans, s=50, cmap='viridis')

centers = kmeans.cluster_centers_
plt.scatter(centers[:, 0], centers[:, 1], c='black', s=200, alpha=0.5);
plt.xlabel('Longitude')
plt.ylabel('Latitude')
plt.title('Clusters of Neighborhoods in Toronto')
plt.savefig('Clusters of Neighborhoods in Toronto.png')
plt.show()



In [89]:
from sklearn.metrics import pairwise_distances_argmin

def find_clusters(X, n_clusters, rseed=2):
    # 1. Randomly choose clusters
    rng = np.random.RandomState(rseed)
    i = rng.permutation(X.shape[0])[:n_clusters]
    centers = X[i]
    
    while True:
        # 2a. Assign labels based on closest center
        labels = pairwise_distances_argmin(X, centers)
        
        # 2b. Find new centers from means of points
        new_centers = np.array([X[labels == i].mean(0)
                                for i in range(n_clusters)])
        
        # 2c. Check for convergence
        if np.all(centers == new_centers):
            break
        centers = new_centers
    
    return centers, labels

centers, labels = find_clusters(X, 4)
plt.scatter(X[:, 0], X[:, 1], c=labels,
            s=50, cmap='viridis');
plt.xlabel('Longitude')
plt.ylabel('Latitude')
plt.title('Clusters of Neighborhoods in Toronto')
plt.savefig('Clusters of Neighborhoods in Toronto Reduced clusters.png')
plt.show()


Result

Downtown Toronto has more places in Toroto when compared to all remaining neighborhoods. So when a person wants to relocate based upon all the avilable public amenities they can recognize that Downtown Toronto is the best of all the avilable options.